In this lab you will work directly on this notebook. You will be required to:
You can work in groups of two.
pdf (File -> Export Notebook As -> PDF)pdf and in ipynb (this file), name it "ISAT_labX-Name1_Name2" (with X the n. of the lab)The submission is done by sending the compressed folder to mauro.dalla-mura@gipsa-lab.grenoble-inp.fr. One submission per group.
The submission can take place in two steps.
The purpose of this lab session is to get familiar with some common techniques for the analysis of remote sensing hyperspectral images. We will analyze a hyperspectral image acquired on Washington DC Mall, Washington, DC, United States (https://goo.gl/maps/prYNey1N5z81ZtcE7). More information on the data are available in the Appendix of this notebook.
Tip: use the Summary to move into the notebook, to collapse or show cells, etc.
This notebook is divided into different parts composing a possible experimental analysis:
This cells contains all the packages and functions that are needed for executing all the cells of this notebook. Please feel free to add new import statements to customize the code and to make new additional experiments.
# Uncomment this for installing the required packages
# !pip install seaborn scikit-image scikit-learn
import scipy.io
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from sklearn.metrics import confusion_matrix, precision_score, recall_score, f1_score
import matplotlib
from sklearn.preprocessing import MinMaxScaler, StandardScaler
from utils import imshow_stretch, gen_train_test, morph_prof
import seaborn as sns
import pandas as pd
Throughout the notebook, we will use some random generators. You may want to fix the seed in input to the PRNG* in order to have reproducible results throughout several trials.
* Pseudo-Random Number Generator: a deterministic algorithm for generating a pseudo-random sequence of numbers from a seed. By fixing the seed, you will obtain the same sequence once restarted the algorithm.
np.random.seed(42)
The variable data will contain the hyperspectral image.
Check the size of the image (num rows, num cols, num bands) by typing data.shape. The number of rows, columns and bands are stored in the variables n_rows, n_columns and n_bands, respectively.
data = scipy.io.loadmat('data/dcmall.mat')['data']
labels = scipy.io.loadmat('data/dcmall.mat')['test']
class_names = scipy.io.loadmat('data/dcmall.mat')['class_names']
class_colormap = scipy.io.loadmat('data/dcmall.mat')['class_colormap']
# images are reshaped in a vectorized fashion
n_rows, n_columns, n_bands = data.shape
data_array = data.reshape((n_rows * n_columns, n_bands))
labels = labels.reshape((n_rows * n_columns, 1)) # ground truth for the classification
# classes from 1 to 7 are taken for classification
orig_labels=np.copy(labels)
valid_indices = labels != 0
labels = labels[valid_indices]
data.shape, data_array.shape, labels.shape
((1280, 307, 191), (392960, 191), (8079,))
This part of the session is devoted to the visualization and inspection of the hyperspectral image acquired over Washington DC Mall.
imshow_stretch (e.g., if you want to see the 15th band
you should type: imshow_stretch(data[:,:,14])wavelengths.txt: it lists the correspondence between the band number
and the wavelenght* N.W.: this instruction should be followed by plt.show().
_ = plt.figure(figsize=(12,8), dpi= 100)
imshow_stretch(np.flipud(data[:,:,14].T), cmap="gray") # rotated for better visualization
plt.show()

By looking at the figure above, identify the band number (approximately) corresponding to the wavelength of the red, green and blue color. Show a true color composition (R, G, B) of the hyperspectral image by typing: imshow_stretch(data[:,:,(bR, bG, bB)]).
Red band = 650nm
From the file data/wavelengths.txt:
br, bg, bb = 48, 28, 11
rgb_bands = (br, bg, bb)
_ = plt.figure(figsize=(12,8), dpi= 100)
imshow_stretch(np.flipud(np.swapaxes(data[:,:,rgb_bands],1,0)))
plt.show()
bnir = 119
nirrg_bands = (bnir, br, bg)
nirrg_bands
(119, 48, 28)
_ = plt.figure(figsize=(12,8), dpi= 100)
imshow_stretch(np.flipud(np.swapaxes(data[:,:,nirrg_bands],1,0)))
plt.show()
The false color compositions looks overexposed comparing to the true color compositions.
In this part you will perform some simple data inspections in order to gain more insights about the data.
imshow_stretch(data[:,:,b]).
The central wavelenght corresponding to each band is reported in the file data/wavelengths.txt.imshow_stretch(np.flipud(np.swapaxes(data[:,:,0],1,0)))
plt.title("1th band")
plt.show()
imshow_stretch(np.flipud(np.swapaxes(data[:,:,1],1,0)))
plt.title("2th band")
plt.show()
imshow_stretch(np.flipud(np.swapaxes(data[:,:,data.shape[-1] // 2],1,0)))
plt.title("95th band")
plt.show()
imshow_stretch(np.flipud(np.swapaxes(data[:,:,data.shape[-1] - 1],1,0)))
plt.title("191th band")
plt.show()

atmo_bands = slice(0, 140, 1)
np.random.seed(42)
y = np.random.randint(0, data.shape[0])
x = np.random.randint(0, data.shape[1])
plt.plot(np.linspace(400, 2000, 140), data[y, x, atmo_bands])
plt.xlabel("Wavelength $\lambda$")
plt.ylabel("Energy")
plt.show()
We notice that the absorption range of atmosphere [400-1000]nm has an impact on the reflectance of a pixel.
Show a single band or a RGB composite of the image. You are encouraged to create a new figure by typing plt.figure() and to play with its input parameters figsize and dpi for having a larger (smaller) image and a higher (lower) resolution, respectively.
Move the cursor on the image. The top left corner is (0, 0) and the bottom right corner is (n_rows, n_columns).
plt.plot(data[row_1, col_1, :], 'b', label='pixel 1')
plt.plot(data[row_2, col_2, :], 'g', label='pixel 1')
# ...
plt.legend()
plt.show()
# -> Your code here
_ = plt.figure(figsize=(12,8), dpi= 100)
imshow_stretch(np.flipud(np.swapaxes(data[:,:,rgb_bands],1,0)))
plt.plot([400], [110], marker="x", color="blue", markersize=10) # Dirt
plt.plot([350], [270], marker="x", color="red", markersize=10) # Water
plt.plot([210], [210], marker="x", color="orange", markersize=10) # Tree
plt.plot([600], [90], marker="x", color="magenta", markersize=10) # Building
plt.show()
_ = plt.figure(figsize=(12,8), dpi= 100)
imshow_stretch(data[:,:,rgb_bands])
plt.plot([data.shape[1] - 110], [400], marker="x", color="blue", markersize=10) # Dirt
plt.plot([data.shape[1] - 270], [350], marker="x", color="red", markersize=10) # Water
plt.plot([data.shape[1] - 210], [210], marker="x", color="orange", markersize=10) # Tree
plt.plot([data.shape[1] - 90], [600], marker="x", color="magenta", markersize=10) # Building
# y (height), x (width)
dirt = (400, data.shape[1] - 110)
water = (350, data.shape[1] - 270)
tree = (210, data.shape[1] - 210)
building = (600, data.shape[1] - 90)
wave_space = np.linspace(400, 2473, data.shape[-1]) # Almost linear wavelength
plt.figure(figsize=(15, 5))
plt.plot(wave_space, data[water[0], water[1]], color="blue", label="water")
plt.plot(wave_space, data[dirt[0], dirt[1]], color="brown", label="dirt")
plt.plot(wave_space, data[tree[0], tree[1]], color="green", label="tree")
plt.plot(wave_space, data[building[0], building[1]], color="gray", label="building")
plt.legend(loc="best")
plt.xlabel("Wavelength $\lambda$")
plt.ylabel("Energy")
plt.show()
After 2000nm, the energy is the same for each class
In order to check the correlation among bands, we look at the correlation matrix of the data, which shows the cross-correlation between pairs of bands. We have already provided you with the code for its calculation and display. The colorbar at the right side shows the Pearson correlation coefficient $\rho$. Small values denote low correlation, while large values high correlation.
The bands [0-50] are heavily correlated with each other while the bands [50-100] are also correlated with each other. On the other hand, [0-50] and [50-100] are absolutely not correlated.
# %matplotlib widget
%matplotlib inline
Cm = np.corrcoef(data_array[range(0,data_array.shape[0],100),:], rowvar=False)
plt.imshow(Cm)
plt.colorbar()
plt.title('Band correlation matrix')
plt.show()
Check the dynamic of the data (range of values) by looking at the standard deviation of each band. We have already provided you with the code for its calculation and display.
The most "informative" bands are the one with the greatest standard deviation.
sigma = np.std(data_array, axis=0)
plt.figure()
plt.plot(sigma)
plt.xlabel('Bands')
plt.title('Band standard deviation')
plt.show()
plt.ylim([400, 800])
plt.plot(data_array[:500, np.argmin(sigma)], label="Greatest std band")
plt.plot(data_array[:500, np.argmax(sigma)], label="Lowest std band")
plt.xlabel("Slices")
plt.ylabel("Energy")
plt.legend()
plt.show()
In this part, you are asked to analyze how a reduction of the data dimensions (initially, there are 191 bands) can still retain most of the information linked to the data variability and under which conditions this may be possible.
You will make a Principal Component Analysis (PCA). Roughly speaking, the PCA is a linear transformation which reprojects the data in another space which axes are the direction of maximum variability. Therefore, the intuition behind this is that the transformation will retain the main variability (information) of the data.
In addition, given $N$ the dimensionality of the data, it is possible to select the first $ k \ll N$ principal components (which correspond to the eigenvectors associated to the $k$ largest eigenvalues of the covariance matrix of the data) for obtaining a representation of the data which retains as much variability as possible.
We have already provided you with the code for its calculation, but please take a little bit of time to analyze it and understand how the PCA works.
N.W.: this version of the PCA code is far unefficient with respect to ones that you can find in standard statistical libraries. In fact, the calculation of the covariance matrix become harder and harder as the dimensionality increases. Efficient methods overcome this calculation by recalling some properties which link the data to the eigenvalues and eigenvectors of the covariance matrix.
Sigma = np.cov(data_array, rowvar=False) # covariance matrix
[U, S, V] = np.linalg.svd(Sigma) # U: eigenvectors of Sigma, S: eigenvalues of Sigma
D = data_array - np.mean(data_array, axis=0) # center the data: rigid rotation
Dp = np.dot(data_array, U) # linear transformation
Inspect the eigenvalues (e.g., plot(S[:k]) for the first k eigenvalues). You will find out that the eigenvalues are sorted by decreasing value (the same sorting goes for the eigenvectors U). You can show that the sum of the eigenvalues is equal to the trace of the covariance matrix, that is the total variance (variance preservation).
Each eigenvalue accounts for a certain fraction of explained variance of the original data. Inspect which is that fraction of variance associated to the data
About how many components do you think are relevant for a good approximation of the data? Why?
# -> Your code here
plt.plot(S[:10])
np.allclose(np.cumsum(S)[-1], np.trace(Sigma))
True
th = 0.995
explained_variance = np.cumsum(S) / np.cumsum(S)[-1]
plt.plot(explained_variance)
k = np.argwhere(explained_variance>th)[0][0]
k
3
3 eigenvectors/eigenvalues can explain 99.5% of the data variance.
Show the principal components. In order to show them you need to reorganize the data in matrix format (e.g., PC1 = Dp[:, 0].reshape((n_rows, n_columns)) for the first PC).
Dp.max(), Dp.min()
(2911.1569161826765, -8035.422017144846)
# -> Your code here
fig, ax = plt.subplots(nrows=k+1, ncols=1, figsize=(10, 10))
for i in range(k):
ax[i].imshow(np.flipud(np.swapaxes(Dp[:, i].reshape(data.shape[:2]), 0, 1)), cmap="gray")
ax[i].axis("off")
ax[i].set_title(f"{i+1} PC (std: {np.std(Dp[:, i])})")
ax[-1].imshow(np.flipud(np.swapaxes(Dp[:, -1].reshape(data.shape[:2]), 0, 1)), cmap="gray")
ax[-1].axis("off")
ax[-1].set_title(f"{Dp.shape[-1]} PC (std: {np.std(Dp[:, -1])})")
plt.show()
The last eigenvalues has a very low range of value (very low standard deviation). The few values does make some sense. There are shapes of building. So the smallest eigemvalues does not necessarily noise but for sure holds a very few information.
Cp = np.cov(Dp, rowvar=False) and set very low values to zero
Cp[np.abs(Cp) < 1e-10] = 0. You should find out that the matrix is diagonal. What does it mean? What does it imply?)# -> Your code here
This part is devoted to perform a classification of the data. In particular, by relying on the spectral information of the bands, we want to set a "supervised" learning task: we will have a training set consisting of pixels and labels, where each label is going to describe the material of that pixel. The goal will be to train a classifier by using <pixel, label> pairs, such that it learns how to classify the material of a pixel by relying on its spectral features. The classifier will be then tested on a test set for assessing its performance.
We will take 40 randomly chosen samples from the labeled set for each class in order to perform the training. The remaining samples will be used for the test. For creating the two sets run the following cell.
classes = np.unique(labels)
n_classes = classes.shape[0]
n_samples_per_class = 40
tr_idx, ts_idx = gen_train_test(orig_labels, labels, classes, n_samples_per_class)
X_train, y_train = data_array[tr_idx].astype('double'), orig_labels[tr_idx].ravel()
X_test, y_test = data_array[ts_idx].astype('double'), orig_labels[ts_idx].ravel()
You will perform the training of the SVM on the training set and run the inference (classification) on the whole data, while you will evaluate the performance only on the test data.
The intuition behind the Support Vector Machine is the so-called "maximum margin classification". Imagine a problem with two well-separable classes in a 2D feature space. A linear SVM will classify the samples by drawing a straight line for separating the two classes, such that it will maximize the distance from the nearest points of the two classes, as shown in the figure below.

In more complex setting, SVM use a "soft margin" when the classes are not perfectly separable and will use the Kernel trick for drawing nonlinear decision shapes.

In this notebook, the SVM you will use will have a Radial Basis Function (RBF) Kernel (or Gaussian kernel), whose mathematical formulation is the following:
$K(\mathbf{x}, \mathbf{y}) = exp( \frac {\lVert \mathbf{x} - \mathbf{y} \rVert ^ 2} {2\sigma^2} )$, where often $\gamma = \frac{1}{2\sigma^2}$.
The $\gamma$ parameter controls the shape of the nonlinear decision boundary. Small values of gamma will correspond to smoother shapes around the support vectors (large standard deviation), while larger values of gamma will let the decision boundary to follow more and more the support vectors (small standard deviation) with the risk of overfitting the data.
Conversely, the $C$ parameter of SVM will (inversely) control the amount of regularization of the model. Large values of C will make the margin to be hard, with little or no flexibility for misclassified samples, while smaller values of C will let the margin to be soft and relax the maximal margin condition in the optimization function.
In this case, you will have these two free parameters to adjust for the classification in the second part of the exercise.
In addition, the data are scaled in order to better condition the underlying optimization problem.
N.W.: do not forget to scale (or inverse-scale) the data for avoiding inconsistent results!
# data scaling: they will have zero mean and unit standard deviation
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
# useful colormap tools for displaying the classification results
colors = [class_colormap[i] for i in range(len(class_colormap))]
cmap = matplotlib.colors.ListedColormap(colors, name='colors', N=None)
# reminder of the class colors
fig, ax = plt.subplots(figsize=(12,8))
color_matrix = np.zeros((200, 200 * n_classes))
for i in range(n_classes):
color_matrix[:, i * 200 : (i + 1) * 1200] = i + 1
ax.matshow(color_matrix, cmap=cmap)
for i in range(n_classes):
ax.text(100 + 200 * i, 100, class_names.ravel()[i], va='center', ha='center')
plt.show()
We can thus instantiate a Support Vector Classifier (SVC) object and train it with the training data (fit).
We choose for you $C=100$ and $\gamma$ inversely proportional to the dimensionality of the data (use the keyword parameter gamma='auto'). There are pretty interesting motivations behind the tuning of the $\gamma$ parameter, but they go beyond the scope of the exercise and therefore will not be addressed.
N.W.: for this section, do not forget to change the value of $\gamma$ according to the dimensionality of the data (e.g., if you are classifying by using only 5 bands, therefore the value of $\gamma$ should be 1/5).
# Classification with SVM
svm = SVC(kernel='rbf', C=100, gamma=1/data_array.shape[1])
svm.fit(X_train, y_train)
SVC(C=100, gamma=0.005235602094240838)
Once fitted, the model can be used in inference mode, that is it can be used for classifying new pixels. We use it first for classifying the whole image. Then we use it for evaluating the test pixels and on that set of samples we compute also the Overall Accuracy, which represents the fraction of well classified samples.
# Inference and accuracy calculation
pred = svm.predict(scaler.transform(data_array)) # classification of the total image
acc = svm.score(X_test, y_test) # TEST mean accuracy (n. correctly classified test samples / total n. test samples)
print('Mean TEST Overall Accuracy (OA):', acc)
Mean TEST Overall Accuracy (OA): 0.9790998846005898
We can display the classification result.
# Classification result visualization
plt.figure(figsize=(12,8), dpi= 100)
plt.imshow(np.flipud(pred.reshape(n_rows, n_columns).T), cmap=cmap) # rotated for better visualization
plt.title('SVM classification result')
plt.show()
If we only analyze the overall accuracy (OA) and we only look at the classified image, we could achieve misleading results, even if the image looks fine "by eye" and the OA is close to 100%, as in this case.
A useful tool for inspecting the complete classification result is the Confusion Matrix. The generic element $C_{i,j}$ of this matrix represents the number of samples that is known to belong to class $i$ and has been predicted as belonging to class $j$. We can now note that the overall accuracy is, in fact, the sum of the element on the diagonal of the confusion matrix. The out-of-diagonal elements are all those misclassified samples which were assigned to a class $j$ but belonged to class $i$ or vice-versa (see the examples in the cells below for better understand).
\ \ From the confusion matrix we can define further indexes, which are general concepts but we'll particularize them to our case.
The recall is also called "sensitivity": it measures the fraction of positives which are correctly identified. The precision is also called "specificity": it measures the fraction of negatives which are correctly identified.
An example of interpretation of these two indexes is provided just below.
In a real scenario, both of these two indexes cannot be both high at the same time, although it is ideally what we would like.
\ Both these two indicators are complementary to the overall accuracy. Their harmonic mean is called F1-score (or F-score) $F = \frac{2}{\frac{1}{R} + \frac{1}{P}}$ and constitutes a more robust index for the assessment of classification results.
# Confusion matrix analysis
cl = ['Roof', 'Street', 'Path', 'Grass', 'Trees', 'Water', 'Shadow']
y_predicted = svm.predict(X_test)
cm =confusion_matrix(y_test, y_predicted)
df_cm = pd.DataFrame(cm, index = [i for i in cl],
columns = [i for i in cl])
plt.figure(figsize = (10,7))
sns.heatmap(df_cm, annot=True, fmt='g', cmap='Blues')
plt.xlabel('Predicted classes')
plt.ylabel('Actual classes')
plt.title('Confusion matrix')
plt.show()
# further indices
p = precision_score(y_test, y_predicted, average=None)
r = recall_score(y_test, y_predicted, average=None)
fs = f1_score(y_test, y_predicted, average=None)
print("Recalls:", r)
print("Precisions:", p)
print("F1-Scores:", fs)
Recalls: [0.96626252 0.95212766 1. 0.99576271 0.98630137 0.99831081 0.96491228] Precisions: [1. 0.81179138 0.73770492 0.9984068 0.97826087 1. 0.72368421] F1-Scores: [0.98284182 0.87637699 0.8490566 0.997083 0.98226467 0.99915469 0.82706767]
Let's take the class 1 (Roof) as an example. The recall is a measure of the model's ability to correctly classify the positive samples. By looking at the confusion matrix, we can look at the first row, in correspondence of the samples that we know they belong to class 1. In fact, 42 of that samples were classified as Street and 26 as Path. Therefore we have $42 + 26$ false negatives, that is samples which belong to Roof class but which were erroneously classified. This means that $R_{Roof} = \frac{3726}{3726 + 42 + 26} = 0.98$.
Conversely, if we look at the Path class, we know that 134 samples were correctly classified, but if we look at its column, we see that 26 samples have been classified as Path but they belonged to the Roof class. Therefore, this 26 samples are false positives, that is samples that have been classified as correct but they were not. This means that $P_{Path} = \frac{134}{134+26}=0.83$.
Perform the same procedures but considering only
Comment the results.
\ Look again at the plot of the standard deviation of the bands. Perform a classification excluding ranges of bands with low (high) standard deviation.
\ [Optional] Perform now a classification by using the data on which you applied PCA.
[Optional] Perform some of the tests already done considering a different number of training samples (change the second input parameter of gen_train_test).
[Optional] Perform further test by changing:
the kernel (you can either instantiate the object LinearSVC or using SVC adding the keyword parameter kearnel='linear').
How the change in these parameters affect the classification? Comment the results.
# TODO --> use this cell or add new cells below
Till now the classification performed on the data was only relying on the spectral characteristics of the pixels. No exploitation of the spatial arrangement of the pixels was done. In order to take this complementary source of information into account, we can consider “spatial features”: features that model the spatial information (e.g., spatial correlation of neighboring pixels, geometrical characteristics of the objects, etc). Once computed, these features can be considered by a classifier.
There are many ways to extract spatial features. In this lab session we will consider the features derived by an Extended Morphological Profile (EMP) (see the paper ClassificationEMP.pdf in the doc folder for more information). The EMP is composed by a concatenation of Morphological Profiles (MPs). An MP is a sequence of opening and closing by reconstruction with a structuring element of fixed shape and increasing size and can be considered as a multiscale representation of the image. A representation is given in the image below.

The EMP is obtained by concatenating different MPs each one computed on one of the first Principal Components of the image (see figure below).

The MP is implemented by the function morph_prof in the file utils.py.
The EMP is obtained by considering the first principal components of the image. The example below shows how to build an EMP considering the first 3 PCs, with MPs computed with circular structuring elements with initial radius 1, increment step size of 3 and 4 levels (thus, leading to an MP of 9 = 4 closings + original image + 4 opening)
EMP = np.zeros((n_rows, n_columns, 27))
for i in range(3):
MP = morph_prof(Dp[:,i].reshape(n_rows, n_columns), 1, 3, 4)
EMP[:,:,9 * i : 9 * (i + 1)] = MP
# TO DO --> visualize the EMP and give some comments
fig, ax = plt.subplots(nrows=5, ncols=3, figsize=(15, 7), dpi=100)
for i in range(3):
ax[0, i].set_title(f"Eigenvalues {i+1}th")
ax[0, i].imshow(np.flipud(np.swapaxes(EMP[:, :, 9 * i], 0, 1)), cmap="gray")
ax[1, i].imshow(np.flipud(np.swapaxes(EMP[:, :, 9 * i + 1], 0, 1)), cmap="gray")
ax[2, i].set_title(f"Original slice")
ax[2, i].imshow(np.flipud(np.swapaxes(EMP[:, :, (9 * i + 9 * (i + 1)) // 2], 0, 1)), cmap="gray")
ax[3, i].imshow(np.flipud(np.swapaxes(EMP[:, :, 9 * (i + 1) - 2], 0, 1)), cmap="gray")
ax[4, i].imshow(np.flipud(np.swapaxes(EMP[:, :, 9 * (i + 1) - 1], 0, 1)), cmap="gray")
_ = [ax[j][i].axis("off") for j in range(5) for i in range(3)]
The EMP images are smoothed out version of the orignal projection and of invert orignal projection.
Perform a classification considering the EMP. Note that the EMP, after its computation, is a stack of images. Thus, it should be converted in vector format for obtaining a vector whose shape is (n_rows * n_columns, num levels of EMP).
# TODO --> use this cell or add new cells below
EMP = EMP.reshape((-1, EMP.shape[-1]))
classes = np.unique(labels)
n_classes = classes.shape[0]
n_samples_per_class = 40
tr_idx, ts_idx = gen_train_test(orig_labels, labels, classes, n_samples_per_class)
X_train, y_train = EMP[tr_idx].astype('double'), orig_labels[tr_idx].ravel()
X_test, y_test = EMP[ts_idx].astype('double'), orig_labels[ts_idx].ravel()
scaler = MinMaxScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)
svm = SVC(kernel='rbf', C=100, gamma=1/EMP.shape[1])
svm.fit(X_train, y_train)
SVC(C=100, gamma=0.037037037037037035)
# Inference and accuracy calculation
acc = svm.score(X_test, y_test) # TEST mean accuracy (n. correctly classified test samples / total n. test samples)
print('Mean TEST Overall Accuracy (OA):', acc)
Mean TEST Overall Accuracy (OA): 0.996025131427106
# Classification result visualization
pred = svm.predict(scaler.transform(EMP)) # classification of the total image
plt.figure(figsize=(12,8), dpi= 100)
plt.imshow(np.flipud(pred.reshape(n_rows, n_columns).T), cmap=cmap) # rotated for better visualization
plt.title('SVM classification result')
plt.show()
# Confusion matrix analysis
cl = ['Roof', 'Street', 'Path', 'Grass', 'Trees', 'Water', 'Shadow']
y_predicted = svm.predict(X_test)
cm =confusion_matrix(y_test, y_predicted)
df_cm = pd.DataFrame(cm, index = [i for i in cl],
columns = [i for i in cl])
plt.figure(figsize = (10,7))
sns.heatmap(df_cm, annot=True, fmt='g', cmap='Blues')
plt.xlabel('Predicted classes')
plt.ylabel('Actual classes')
plt.title('Confusion matrix')
plt.show()
# further indices
p = precision_score(y_test, y_predicted, average=None)
r = recall_score(y_test, y_predicted, average=None)
fs = f1_score(y_test, y_predicted, average=None)
print("Recalls:", r)
print("Precisions:", p)
print("F1-Scores:", fs)
Recalls: [0.9939378 0.9893617 1. 1. 1. 0.99915541 0.94736842] Precisions: [0.99946992 0.93233083 1. 1. 1. 1. 0.96428571] F1-Scores: [0.99669618 0.96 1. 1. 1. 0.99957752 0.95575221]
Comment the results. What do you expect? Does the classifier exploit the spatial features? This let the model achieve better performances? Comment all this aspects. Are there any areas in the image in which you clearly see a better classification? Do you figure out why?
[Optional] Perform a classification of each MP. Change the settings of the MP (size of the SE, number of levels) and see the changes in the classification.
[Optional] Perform a classification concatenating the EMP and the original image.
Let us suppose now that no prior information on the land cover classes of interest are available. In this case one could perform an unsupervised analysis in order to identify if there are some groupings/clusters in the data. Data belonging to the same cluster share some common properties, e.g., the type of land cover (hopefully). Conversely to supervised classification, the semantic meaning of these classes should be determined a posteriori once the clusters are found.
Question Run the unsupervised clustering algorithm k-means on the data.
Look at the obtained results (e.g., spectrum corresponding to the mean of each cluster and spatial arrangement of the class).
Question [optional] Run the same analysis considering both the spectral and spatial features as done for the supervised classification done in the previous part.
We now consider the problem of spectral unmixing.
The goal is to obtain the spectral signatures, called endmembers, of the materials present in the image, and then, estimate their fractional abundances. We will apply some well known spectral unmixing algorithms to synthetic and real datasets. In particular, we will consider algorithms that assume a linear mixing model (LMM).
Some reference papers are in the folder doc.

The figure shows a graphical representation of the linear mixing model widely employed to study the formation of hyperspectral images and their spectral unmixing. The incident light reflected by the surface (displayed in the zoom-in), is collected by the hyperspectral sensor as a signal at each wavelength. The LMM assumes that the surface can be divided in homogeneous regions of different materials. The light beams bounce on these separate patches and go to the sensor without any additional interaction. The values obtained by the sensor correspond to the sum of the light of each beam.
In a formal way, the hyperspectral pixel, $\mathbf{r}\in\mathbb{R}^L$, where $L$ is the number of spectral bands, is formed as a linear combination of the spectral characterization of the materials weighted by their relative abundance and additive noise: \begin{equation} \mathbf{r} = \sum_{p=1}^{P} \mathbf{e}_p a_p + \mathbf{n}, \label{eq:lmm} \end{equation} where $\mathbf{e}_p\in\mathbb{R}^L$ denotes the spectral signature of the $p$-th material, and $a_p\in\mathbb{R}$ denotes the spatial abundance of the respective material. Normally, the spectral signature corresponds to the radiance or the reflectance properties of the materials. The fractional abundances are always non-negative (since it is meaningless to have a negative abundance of any material). Another usual constraint is to assume that the abundances of all the material present in a hyperspectral pixel sum up to one: \begin{equation} \sum_{p=1}^P a_p = 1. \label{eq:ASC} \end{equation}
The LMM is a simple approximation model. In real life, the materials in the surface are not separated in homogeneous regions and the signal collected by the sensor is non-linear. The non-linearities are due to the intimate mixture of the materials that leads to multiple scatterings as it is shown in the figure below.

However, the LMM is a good approximation, that serves well to many applications, and presents very nice analytical properties.
We will explore a small library of spectral signatures of materials provided by the U.S. Geological Survey laboratory (http://www.usgs.gov/). Researchers at the U.S; Geological Survey Spectroscopy Lab have measured the spectral reflectance of hundreds of materials in the lab, and have compiled a spectral library known as the USGS Spectral Library (http://speclab.cr.usgs.gov/spectral-lib.html). The libraries are used as references for material identification in remote sensing images.
Question Load and display some spectra in the USGS library. and show the spectrum of the mixture of two materials with different coefficients. Some of the materials present high values of reflectance in most of the wavelengths (bright reflectance) while others absorb or transmit most of the energy (low reflectance). Plot the spectrum of some materials and look at their reflectance values (you can notice intervals of the wavelengths in which the material reflects, absorb etc).
The spectral signatures of the materials could be affected by illumination inhomogeneities, (e.g., shadows), and by the angle of incidence of the light beams (i.e., by the geometry of the surface such as the presence of plains, valleys, mountains, etc). In any case, the shape of the spectral signature of a given material should not be substantially modified. In general, we are more interested in the shape of the spectral signature than in its amplitude.
Question Group (cluster) the materials according to their pair-wise Euclidean and angular distances}: do not spend much time on this. For instance, for each endmember, you can divide the pair-wise distances in three groups, the close range one for values between 0 and 0.1, the middle range one for values between 0.1 and 0.2, and the far range one for values greater than 0.2.
Question Simulate a simple model producing a modification of the spectral signature due to a change in illumination as a (random) \textit{multiplicative scaling factor} that applies to each endmember in the library. Compute the pair-wise Euclidean and angular distances of the scaled endmembers and compare them to the distances computed on the original set of endmembers. Make some comments.
Finally, we will simulate data according to the LMM. For a given number of materials, $P$, a hyperspectral pixel can be simulated by linearly combining $P$ random endmembers selected from the USGS spectral library using Eq.~\eqref{eq:lmm}. The goal here is to get familiar with the LMM.
# Your code here
Consider the estimation of endmembers by a geometric approach such as Vertex Component Analysis [Jose M. P. Nascimento and Jose M. B. Dias, "Vertex Component Analysis: A Fast Algorithm to Unmix Hyperspectral Data" submited to IEEE Trans. Geosci. Remote Sensing, 2004]. An implementation of VCA in python is available at:# Your code here https://github.com/Laadr/VCA
Question Run VCA and discuss the results obtaines. Can you recognize any material identified as endmembers? How to chose the number of endmembers to extract?
# Your code here
Once we have estimated the endmembers from the data, we follow by estimating their corresponding fractional abundances. In order to do that, we define an optimization problem: \begin{equation} \mathbf{a}^{*} = \arg\min_{\mathbf{a} \geq 0} \left\| E\mathbf{a} - \mathbf{r} \right\|_{2}^{2}, \label{eq:opt1} \end{equation} where $r\in\mathbb{R}_{+}^{L}$ denotes a data sample, $E\in\mathbb{R}_{+}^{L \times P}$ denotes the estimated endmembers and $\mathbf{a}\in\mathbb{R}_{+}^{P}$ denotes the abundances. The constrained optimization problem can not be solved analytically, but it has an unique solution that can be obtained by performing a non negative least squares.
The solution to the above optimization problem does not take into account the sum-to-one constraint. In order to do that, we need to solve the following optimization problem: \begin{equation} \mathbf{a}^{*} = \arg\min_{\mathbf{a} \geq 0} \left\| E\mathbf{a} - \mathbf{r} \right\|_{2}^{2},~\textrm{s.t.}~\sum_{p=1}^P a_p = 1. \label{eq:opt2} \end{equation} Solving the above equation can be easily done by \texttt{lsqnonneg} function, making the following changes: $\tilde{E} = \left[E; \mathbf{1}_P\right]$ and $\tilde{\mathbf{r}} = \left[ \mathbf{r}; 1 \right]$.
The procedure presented here is based on the estimation of the endmembers from the data and the retrieval of their abundances. We consider now an alternative approach. Let us assume to have at our disposal a spectral library containing potentially many spectra. The goal is to estimate the abundances from such large pool of endmembers. The problem we face in this case, is that when solving the inversion problem with an endmember matrix with many entries usually leads to a solution with a reconstruction error very low since being the (intrinsic) dimensionality of the data lower than the number of spectra in the library. In order to avoid solutions that are acceptable from a representation point of view (low reconstruction error) but with not much real significance a general rule is to force that only few endmembers are active for each pixel. This accounts for an estimation of the abundances in which only few abundances are different from zero for each pixel. This constraint can be imposed in the abundance estimation algorithm by imposing that abundances are sparse (contains many zeros).
The estimation of the abundances considering different constraints (e.g., non-negativity, sum to one and sparsity) can be performed by the algorithm: Sparse unmixing via variable splitting and augmented Lagrangian methods (SUNSAL). A python implementation of SUNSAL is available at: https://github.com/Laadr/SUNSAL
Question Estimate the abundances considering a set of known endmembers (e.g., estimated in the previous part or coming from the spectral library)
# Your code here
Consider now the joint estimation of endmembers and abundances with constraints of non-negativity.
This problem can be approaches by Non negative matrix factorization (NMF).
An implementation of NMF is available in scikit-learn (https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html#sklearn.decomposition.NMF)
Question Compute spectral unmixing by considering NMF. Compare the obtained results with the results in the previous parts.
# Your code here
Question [optional] Test the different approaches tested so far on different images (e.g., cuprite).
# Your code here
The hyperspectral image of Washington DC mall will be used in this lab section (Figure 5) . The image was acquired by the airborne mounted Hyperspectral Digital Imagery Collection Experiment (HYDICE) sensor over the the Mall in Washington, DC. The image comes along with MultiSpec (a freeware multispectral image data analysis system developed at Purdue University) and is available at https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html. It is provided with the permission of Spectral Information Technology Application Center of Virginia who was responsible for its collection. The sensor system used in this case measured pixel response in 210 bands in the 0.4 to 2.4 μm region of the visible and infrared spectrum. Bands in the 0.9 and 1.4 μm region where the atmosphere is opaque have been omitted from the data set, leaving 191 bands. The data set contains 1208 scan lines with 307 pixels in each scan line with a spatial resolution of about 2.8 m. Seven thematic land cover classes are present in the scene: Roofs, Street, Path (graveled paths down the mall center), Grass, Trees, Water, and Shadow.
Below the Washington DC Mall data set is shown (false color composite).
